Add GRL Tetris resource server #578

yixinhuang48 · 2026-01-12T21:33:57Z

Contributing To NeMo-Gym (GRL Tetris Resource Server)

1) Necessary information

i. Corresponding dataset on the spreadsheet

N/A

ii. Description of the prompt (source + domain)

Domain: Classic falling-block Tetris (grid-based game; tool-use agent).
Source: Synthetic prompts generated programmatically for board configurations (seeds, sizes, piece sets). Prompts instruct the agent to use the step tool and clear at least one line.

iii. Description of the environment

A vendored, self-contained Tetris environment under resources_servers/grl_tetris/tetris_env, modified from the GRL repo implementation.
Configurable board dimensions (4x4–6x6) and piece sets (box_type 0–3).
Observation: ASCII board with _ empty, # locked, X active piece.
Actions: Left, Right, Down.
FastAPI resource server following NeMo Gym conventions.

iv. Description of the verifier

Verifier is the environment: success=true when a line clear occurs; cumulative reward is returned only on success; otherwise zero.
/verify computes final reward and cleans up per-session state.

v. Legal approval status

Code: Apache 2.0.
Data: Synthetic, programmatically generated (Apache 2.0).
No third-party runtime data included.

2) Simple correctness check

i. Commands used to run the server for the uploaded data

# Start NeMo Gym servers (agent + Tetris)
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/grl_tetris/configs/grl_tetris.yaml"
ng_run "+config_paths=[$config_paths]"

# Collect 5 rollouts with the sample dataset
ng_collect_rollouts +agent_name=grl_tetris_game_agent \
  +input_jsonl_fpath=resources_servers/grl_tetris/data/example.jsonl \
  +output_jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl \
  +limit=5

# View rollouts
ng_viewer +jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl

ii. Resulting rollout and judges (5 examples)

See resources_servers/grl_tetris/data/example_rollouts.jsonl
Expected behavior:
- Successful line clear → reward ≈ 9.0–9.2, success=true
- No line clear → negative step penalties (e.g., -0.1 per step), success=false

iii. Additional notes for running the server properly

Must call /seed_session before /step.
Actions accepted as labels or indices ("Left", "Right", "Down" or 0/1/2).
Session cookies are maintained by the middleware; the agent path handles cookie propagation automatically.
Large rollout artifacts (e.g., rollouts.jsonl) are gitignored; do not commit them.

3) Tests

Test files / command to run tests

# Resource server tests
pytest resources_servers/grl_tetris/tests -q

Notes on coverage / responsibilities

Tetris server tests: seed/step flow, action parsing, done handling, verify success/failure, cleanup.
Game agent tests: /v1/responses tool-call loop (done handling), end-to-end /run path that seeds → responds → verify.

4) Reward profiling

Models

Used in this PR: Qwen3-4B (discussed with @banghuaz-nvidia over Slack)

Method

Test set: 200 prompts, 16 rollouts per prompt (3,200 total).
Tool calling enabled; agent loops until done or max_steps; reward aggregated from env.

Commands

# Auto pipeline (Qwen3-4B): runs vLLM + servers + collection + analysis
cd resources_servers/grl_tetris
./run_qwen3_4b_eval_loop.sh  # or ./run_qwen3_4b_eval.sh

# Manual analysis (any model/output)
python analyze_rewards.py \
  --rollouts-path resources_servers/grl_tetris/data/qwen3_4b_eval/rollouts.jsonl \
  --model-name "Qwen3-4B" \
  --output resources_servers/grl_tetris/data/qwen3_4b_eval/reward-analysis.md

Results (Qwen3-4B, 3,200 rollouts)

Success rate: 5.09% (163/3,200)
Mean reward: -0.29 (min -2.00, max 19.20; median -0.80)
Average tool calls/rollout: 7.48
Tool calls ↔ reward correlation: -0.06 (weak negative)
Report: resources_servers/grl_tetris/data/qwen3_4b_eval/reward-analysis.md

5) Training results

Ran GRPO training using Qwen3-4b-instruct for 3200 training examples and 800 validation examples (4x4 type 1 Tetris configuration).

copy-pr-bot · 2026-01-12T21:34:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yixinhuang48 · 2026-01-12T21:41:02Z

Uses the same modified version of simple_agent app.py as in #564.

yixinhuang48 · 2026-01-12T21:42:58Z

@cmunley1 @bxyu-nvidia this is the updated PR for the one that I closed (#260).

- resources_servers/grl_tetris: environment, config, tests, data - Tetris game environment with step/verify endpoints - Example data and test examples generator Verified DCO and cryptographic signing. Signed-off-by: yixin <yixinhuang48@gmail.com>

yixinhuang48 force-pushed the feature/grl-tetris-integration branch 4 times, most recently from 287bfb6 to 5cae458 Compare January 12, 2026 21:40

Add GRL Tetris resource server

15e962f

- resources_servers/grl_tetris: environment, config, tests, data - Tetris game environment with step/verify endpoints - Example data and test examples generator Verified DCO and cryptographic signing. Signed-off-by: yixin <yixinhuang48@gmail.com>

yixinhuang48 force-pushed the feature/grl-tetris-integration branch from 5cae458 to 15e962f Compare January 12, 2026 21:44

Merge branch 'main' into feature/grl-tetris-integration

ad8c879

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GRL Tetris resource server #578

Add GRL Tetris resource server #578

Uh oh!

yixinhuang48 commented Jan 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jan 12, 2026

Uh oh!

yixinhuang48 commented Jan 12, 2026

Uh oh!

yixinhuang48 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add GRL Tetris resource server #578

Are you sure you want to change the base?

Add GRL Tetris resource server #578

Uh oh!

Conversation

yixinhuang48 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributing To NeMo-Gym (GRL Tetris Resource Server)

1) Necessary information

i. Corresponding dataset on the spreadsheet

ii. Description of the prompt (source + domain)

iii. Description of the environment

iv. Description of the verifier

v. Legal approval status

2) Simple correctness check

i. Commands used to run the server for the uploaded data

ii. Resulting rollout and judges (5 examples)

iii. Additional notes for running the server properly

3) Tests

Test files / command to run tests

Notes on coverage / responsibilities

4) Reward profiling

Models

Method

Commands

Results (Qwen3-4B, 3,200 rollouts)

5) Training results

Uh oh!

copy-pr-bot bot commented Jan 12, 2026

Uh oh!

yixinhuang48 commented Jan 12, 2026

Uh oh!

yixinhuang48 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yixinhuang48 commented Jan 12, 2026 •

edited

Loading